Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor analyser node #256

Merged
merged 19 commits into from
Jan 28, 2023
Merged

Conversation

b-ma
Copy link
Collaborator

@b-ma b-ma commented Jan 25, 2023

Hey,

This is quite a big PR regarding the AnalyserNode and what we discussed in #253:

  • use a thread-safe buffering queue (Arc<AtomicF32>) to ship audio data from audio thread to control thread
  • all computations (FFT) are now done in control thread
  • implement missing methods
  • follow spec according to recomputation of FFT if in same render quantum, etc.
  • added a very simple example (analyser) to complete the quite complex mic_playback one

I'm not completely sure that what have done for thread safety (i.e. moving the node to a different thread as you have done in the mic_playback example) is the cleanest / simplest possible way, but it builds and seems to work quite well. But any case having your thoughts on this point would be nice.

Let me know what you think

@b-ma b-ma requested a review from orottier January 25, 2023 14:48
@github-actions

This comment was marked as outdated.

@b-ma
Copy link
Collaborator Author

b-ma commented Jan 25, 2023

Just added a patch for #236 too

@github-actions

This comment was marked as outdated.

Copy link
Owner

@orottier orottier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I took a first look and I think this very well reflects the 'safe' version we have discussed. Really nice!

One important thing to notice is that the outputs between the current version and yours differ by a lot. Try running your analyser example on main and your branch and see what I mean. You also notice this in the mic_playback example. I wonder if maybe the current version is broken?

I left some small nitpicks that could further improve performance. I will also reply to your comments in the issue in a minute.

In any case, please proceed with your adventures

src/node/analyser.rs Outdated Show resolved Hide resolved
src/analysis.rs Outdated

// single producer / single consumer ring buffer
pub(crate) struct AnalyserRingBuffer {
buffer: Arc<Vec<AtomicF32>>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Arc is not necessary

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, make it like this:

#[derive(Clone)]
pub(crate) struct AnalyserRingBuffer {
    buffer: Arc<[AtomicF32]>,
    write_index: Arc<AtomicUsize>,
}

Then you never need to wrap the thing inside an Arc, just clone them. I think it saves a level of indirection so it is more performant

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create an Arc<[..]> with Arc::from(&vec_of_atomics[..]). (you need an intermediate allocation that hopefully the compiler will skip)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I did the alternative but kept the Arc<Vec<AtomicF32>>. I didn't manage to create the array from the Vec, with your snippet the build failed because it gives Arc<&[AtomicF32]> rather than `Arc<[AtomicF32]>.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was easier than I thought: 1c29c2f
Showing a nice bit of extra performance, before:

bench_analyser_node
  Instructions:            41181322 (-18.52477%)
  L1 Accesses:             58665001 (-17.67864%)
  L2 Accesses:               154232 (-63.80796%)
  RAM Accesses:               84974 (-1.944403%)
  Estimated Cycles:        62410251 (-18.34027%)

after:

bench_analyser_node
  Instructions:            39385732 (-22.07732%)
  L1 Accesses:             56186630 (-21.15673%)
  L2 Accesses:               158969 (-62.67475%)
  RAM Accesses:               84981 (-1.936325%)
  Estimated Cycles:        59955810 (-21.55078%)

src/analysis.rs Show resolved Hide resolved
src/analysis.rs Show resolved Hide resolved
@github-actions

This comment was marked as outdated.

@b-ma
Copy link
Collaborator Author

b-ma commented Jan 26, 2023

One important thing to notice is that the outputs between the current version and yours differ by a lot. Try running your analyser example on main and your branch and see what I mean. You also notice this in the mic_playback example. I wonder if maybe the current version is broken?

Hum weird indeed, I will have a look

@github-actions

This comment was marked as outdated.

@b-ma
Copy link
Collaborator Author

b-ma commented Jan 26, 2023

Ok, I checked the values and they are for sure wrong in main branch as there is no way a sine can trigger a 20dB peak. So I just recreated the example in JS to check the values of current branch against Chrome and Firefox and it appears consistent (10th bin converging around -14dB).

So I think we are all good on this side!

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions
Copy link

Benchmark result:


bench_ctor
  Instructions:             5131764 (-0.363693%)
  L1 Accesses:              8195426 (-0.364311%)
  L2 Accesses:                11487 (-0.060901%)
  RAM Accesses:               54580 (+0.020158%)
  Estimated Cycles:        10163161 (-0.290559%)

bench_sine
  Instructions:            79773062 (-0.037481%)
  L1 Accesses:            117328679 (-0.043396%)
  L2 Accesses:               229609 (-0.717779%)
  RAM Accesses:               56835 (+0.026399%)
  Estimated Cycles:       120465949 (-0.048715%)

bench_sine_gain
  Instructions:            84698821 (-0.044095%)
  L1 Accesses:            124784010 (-0.042897%)
  L2 Accesses:               261761 (-5.106454%)
  RAM Accesses:               56138 (+0.028509%)
  Estimated Cycles:       128057645 (-0.096287%)

bench_sine_gain_delay
  Instructions:           156998660 (-0.030909%)
  L1 Accesses:            224136336 (-0.037908%)
  L2 Accesses:               700265 (-0.727393%)
  RAM Accesses:               58232 (+0.022329%)
  Estimated Cycles:       229675781 (-0.047957%)

bench_buffer_src
  Instructions:            19293327 (-0.154767%)
  L1 Accesses:             29494191 (-0.167701%)
  L2 Accesses:                67670 (-3.951514%)
  RAM Accesses:               83903 (+0.010728%)
  Estimated Cycles:        32769146 (-0.192341%)

bench_buffer_src_delay
  Instructions:            90894541 (-0.045187%)
  L1 Accesses:            128271860 (-0.067991%)
  L2 Accesses:               182125 (+7.333762%)
  RAM Accesses:               84074 (+0.007137%)
  Estimated Cycles:       132125075 (-0.018799%)

bench_buffer_src_iir
  Instructions:            52313605 (-0.064319%)
  L1 Accesses:             74177253 (-0.068871%)
  L2 Accesses:                68994 (-11.21606%)
  RAM Accesses:               83858 (+0.022662%)
  Estimated Cycles:        77457253 (-0.121258%)

bench_buffer_src_biquad
  Instructions:            40098128 (-0.120924%)
  L1 Accesses:             57945706 (-0.103621%)
  L2 Accesses:               138808 (-17.78092%)
  RAM Accesses:               83943 (+0.007148%)
  Estimated Cycles:        61577751 (-0.339859%)

bench_stereo_positional
  Instructions:            48506513 (-0.191923%)
  L1 Accesses:             73409014 (-0.269598%)
  L2 Accesses:               700237 (+2.369341%)
  RAM Accesses:               84225 (+0.007124%)
  Estimated Cycles:        79858074 (-0.146544%)

bench_stereo_panning_automation
  Instructions:            33364560 (-0.111856%)
  L1 Accesses:             51135581 (-0.136540%)
  L2 Accesses:               125912 (+1.952211%)
  RAM Accesses:               83326 (+0.018005%)
  Estimated Cycles:        54681551 (-0.104745%)

bench_analyser_node
  Instructions:            41181322 (-18.52477%)
  L1 Accesses:             58665001 (-17.67864%)
  L2 Accesses:               154232 (-63.80796%)
  RAM Accesses:               84974 (-1.944403%)
  Estimated Cycles:        62410251 (-18.34027%)


-    buffer: Arc<Vec<AtomicF32>>,
+    buffer: Arc<[AtomicF32]>,
@github-actions
Copy link

Benchmark result:


bench_ctor
  Instructions:             5128010 (-0.436579%)
  L1 Accesses:              8195407 (-0.364542%)
  L2 Accesses:                11499 (+0.043501%)
  RAM Accesses:               54578 (+0.016493%)
  Estimated Cycles:        10163132 (-0.290843%)

bench_sine
  Instructions:            79758044 (-0.056300%)
  L1 Accesses:            117325138 (-0.046413%)
  L2 Accesses:               229617 (-0.714320%)
  RAM Accesses:               56832 (+0.021119%)
  Estimated Cycles:       120462343 (-0.051706%)

bench_sine_gain
  Instructions:            84676294 (-0.070679%)
  L1 Accesses:            124764297 (-0.058688%)
  L2 Accesses:               278045 (+0.796819%)
  RAM Accesses:               56138 (+0.028509%)
  Estimated Cycles:       128119352 (-0.048147%)

bench_sine_gain_delay
  Instructions:           156964868 (-0.052426%)
  L1 Accesses:            224138492 (-0.036947%)
  L2 Accesses:               694777 (-1.505396%)
  RAM Accesses:               58231 (+0.020612%)
  Estimated Cycles:       229650462 (-0.058975%)

bench_buffer_src
  Instructions:            19278293 (-0.232554%)
  L1 Accesses:             29490560 (-0.179987%)
  L2 Accesses:                67487 (-4.212618%)
  RAM Accesses:               83908 (+0.016688%)
  Estimated Cycles:        32764775 (-0.205666%)

bench_buffer_src_delay
  Instructions:            90868282 (-0.074069%)
  L1 Accesses:            128272440 (-0.067546%)
  L2 Accesses:               177874 (+4.828472%)
  RAM Accesses:               84079 (+0.014274%)
  Estimated Cycles:       132104575 (-0.034291%)

bench_buffer_src_iir
  Instructions:            52294855 (-0.100177%)
  L1 Accesses:             74174091 (-0.073158%)
  L2 Accesses:                68382 (-12.00360%)
  RAM Accesses:               83859 (+0.022662%)
  Estimated Cycles:        77451066 (-0.129307%)

bench_buffer_src_biquad
  Instructions:            40064336 (-0.205103%)
  L1 Accesses:             57942161 (-0.109730%)
  L2 Accesses:               139008 (-17.66246%)
  RAM Accesses:               83951 (+0.015488%)
  Estimated Cycles:        61575486 (-0.343579%)

bench_stereo_positional
  Instructions:            48427666 (-0.354209%)
  L1 Accesses:             73402508 (-0.278472%)
  L2 Accesses:               699784 (+2.303115%)
  RAM Accesses:               84238 (+0.022560%)
  Estimated Cycles:        79849758 (-0.156975%)

bench_stereo_panning_automation
  Instructions:            33342084 (-0.179292%)
  L1 Accesses:             51129769 (-0.148002%)
  L2 Accesses:               128221 (+3.821831%)
  RAM Accesses:               83331 (+0.025207%)
  Estimated Cycles:        54687459 (-0.093992%)

bench_analyser_node
  Instructions:            39385732 (-22.07732%)
  L1 Accesses:             56186630 (-21.15673%)
  L2 Accesses:               158969 (-62.67475%)
  RAM Accesses:               84981 (-1.936325%)
  Estimated Cycles:        59955810 (-21.55078%)


src/analysis.rs Outdated
// If not wrapped into Arc<Mutex<T>>, compiler complains about thread safety:
// > `(dyn rustfft::avx::avx_planner::AvxPlannerInternalAPI<f32> + 'static)`
// > cannot be shared between threads safely
// But the Mutex is ok here as `Analyser` lives outside the audio thread.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is necessary? It works on my hardware without the Arc+Mutex.
It would also surprise me, Mutex does not magically make non-Send types Send. Being not-Send means you cannot leave the thread you were created on. Arc+Mutex specifically allows one to transfer objects from one thread to another

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty to remove them. Let me know if building fails on your hardware

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, it fails on CI. The Mutex is required to make the thing Sync

@github-actions
Copy link

Benchmark result:


The Mutex is necessary on avx-enabled systems

This reverts commit 589985b.
@orottier
Copy link
Owner

So to summarize, this PR

  • fixes a giant bug in calculation of the actual frequency data
  • shows a 20% instruction count reduction on the render thread
  • fixes a weird construct where the control thread would need to wait a render quantum before the FFT data is shipped
  • adds a lot of tests

I could not be more happier

@github-actions
Copy link

Benchmark result:


bench_ctor
  Instructions:             5128010 (-0.436579%)
  L1 Accesses:              8195409 (-0.364518%)
  L2 Accesses:                11502 (+0.069602%)
  RAM Accesses:               54573 (+0.007330%)
  Estimated Cycles:        10162974 (-0.292393%)

bench_sine
  Instructions:            79758044 (-0.056300%)
  L1 Accesses:            117323179 (-0.048082%)
  L2 Accesses:               231583 (+0.135773%)
  RAM Accesses:               56825 (+0.008800%)
  Estimated Cycles:       120469969 (-0.045379%)

bench_sine_gain
  Instructions:            84676294 (-0.070679%)
  L1 Accesses:            124766292 (-0.057090%)
  L2 Accesses:               276059 (+0.076854%)
  RAM Accesses:               56129 (+0.012473%)
  Estimated Cycles:       128111102 (-0.054583%)

bench_sine_gain_delay
  Instructions:           156964868 (-0.052426%)
  L1 Accesses:            224139443 (-0.036522%)
  L2 Accesses:               693831 (-1.639505%)
  RAM Accesses:               58226 (+0.012024%)
  Estimated Cycles:       229646508 (-0.060696%)

bench_buffer_src
  Instructions:            19278293 (-0.232570%)
  L1 Accesses:             29490470 (-0.180299%)
  L2 Accesses:                67592 (-4.063587%)
  RAM Accesses:               83893 (+0.001192%)
  Estimated Cycles:        32764685 (-0.205734%)

bench_buffer_src_delay
  Instructions:            90868295 (-0.074055%)
  L1 Accesses:            128273421 (-0.066781%)
  L2 Accesses:               176922 (+4.267419%)
  RAM Accesses:               84064 (-0.004758%)
  Estimated Cycles:       132100271 (-0.037574%)

bench_buffer_src_iir
  Instructions:            52294863 (-0.100097%)
  L1 Accesses:             74174109 (-0.073085%)
  L2 Accesses:                68390 (-11.99331%)
  RAM Accesses:               83840 (-0.001193%)
  Estimated Cycles:        77450459 (-0.130088%)

bench_buffer_src_biquad
  Instructions:            40064323 (-0.205200%)
  L1 Accesses:             57940235 (-0.113097%)
  L2 Accesses:               140933 (-16.52224%)
  RAM Accesses:               83938 (-0.001191%)
  Estimated Cycles:        61582730 (-0.331955%)

bench_stereo_positional
  Instructions:            48427682 (-0.354117%)
  L1 Accesses:             73404114 (-0.276242%)
  L2 Accesses:               698214 (+2.073593%)
  RAM Accesses:               84216 (-0.005937%)
  Estimated Cycles:        79842744 (-0.165789%)

bench_stereo_panning_automation
  Instructions:            33342053 (-0.179331%)
  L1 Accesses:             51130907 (-0.145732%)
  L2 Accesses:               127063 (+2.884187%)
  RAM Accesses:               83316 (+0.003601%)
  Estimated Cycles:        54682282 (-0.103597%)

bench_analyser_node
  Instructions:            39385399 (-22.07794%)
  L1 Accesses:             56195598 (-21.14385%)
  L2 Accesses:               149644 (-64.88450%)
  RAM Accesses:               84973 (-1.942162%)
  Estimated Cycles:        59917873 (-21.60130%)


@orottier orottier merged commit a234159 into orottier:main Jan 28, 2023
@b-ma b-ma deleted the feat/refactor-analyser-node branch November 4, 2023 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants